The bayesynergy R package implements a Bayesian semi-parametric regression model for estimating the dose-response function of in-vitro drug combination experiments. The Bayesian framework offers full uncertainty quantification of the dose response function and any derived summary statistics, as well as natural handling of replicates and missing data. The Bayesian model is implemented in Stan (Stan Development Team (2020)), taking advantage of the efficient ‘No U-Turn Sampler’ as well as variational Bayes for quick approximations of the true posterior.
The package is further equipped with plotting functions for summarizing a drug response experiment, parallel processing for large drug combination screen, as well as plotting tools for summarizing and comparing these.
The dose-response function \(f:\boldsymbol{x} \to (0,1)\), maps drug concentrations \(\boldsymbol{x}\) to a measure of cell viability – zero corresponding to all cells being dead after treatment, one corresponding to all cells still alive. In drug-combination screens, it is common to assume that the dose-response function can be broken down as \[ f(\boldsymbol{x}) = p_0(\boldsymbol{x})+\Delta(\boldsymbol{x}), \] where \(p_0(\boldsymbol{x})\) encodes a non-interaction assumption, and \(\Delta(\boldsymbol{x})\) captures the residual interaction effect.
The non-interaction assumption, \(p_0(\boldsymbol{x})\), captures what can be reasonably assumed about a joint drug effect, given estimates of the drugs’ individual effect. We assume a Bliss style independence assumption, where we first assume that the individual drugs’ dose-response function takes the form of a log-logistic curve \[ h_i(x_i|l,s,m) = l + \frac{1-l}{1+10^{s(x_i-m)}}, \] where \(l\) is the lower-asymptote, \(s\) the slope, and \(m\) the drugs ‘EC-50’ on the \(\log_{10}\) scale. The Bliss assumption then amounts to a probabilistic independence assumption, where \[ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \] We call it probabilistic, because we can interpret the individual dose-response curves, \(h_i()\) as probability of cell survival. Defining the events \[ \begin{align} A_i & = \text{A cell survives drug A at concentration $x_{1i}$} \\ B_j & = \text{A cell survives drug B at concentration $x_{2j}$} \\ C_{ij} & = \text{A cell survives both drugs at concentration $\boldsymbol{x}=(x_{1i},x_{2j})$}, \end{align} \] the corresponding probabilities become \[ p_0(\boldsymbol{x}) = P(C_{ij}) = P(A_i)P(B_i) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \]
The interaction component, \(\Delta(\boldsymbol{x})\), captures any joint effect of the drugs that is not captured by the non-interaction assumption. If two drugs are more effective together than it would be expected by \(p_0\), we call it synergy, which corresponds to \(\Delta <0\). The opposite effect is deemed antagonism.
Because the interaction landscape can be complex, with multiple local peaks and valleys, we model this term non-parametrically using a Gaussian Process prior (GP). To ensure that the resulting dose-response function only takes values in the interval \((0,1)\), we push the GP through a transformation function \(g()\). That is \[ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \] where the transformation function looks like \[ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}}. \] In addition to ensuring the proper bounds for the dose-response function, this transformation has the feature of \(g(0)=0\), which corresponds to an a priori assumption that \[ \mathbb{E}\left[f(\boldsymbol{x}) | p_0(\boldsymbol{x})\right] \approx p_0(\boldsymbol{x}). \] That is, we make our non-interaction assumption into a formal prior expectation on the dose-response function. This achieves two things, (1) a slightly conservative model that needs to be convinced that interaction effects are present, and (2) no built-in bias of interaction in the prior structure.
The covariance function \(\kappa(\boldsymbol{x},\boldsymbol{x}')\) can be given multiple specifications, including a squared exponential, Matérn, and Rational Quadratic covariance functions. By default, we use a Matérn covariance with the \(\nu\) parameter set to 3/2 yielding \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}. \] Finally, by utilizing the natural grid structure of the drug concentrations, we can write the kernel function as \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2 \kappa(x_1,x_1')\kappa(x_2,x_2'), \] which induces a Kronecker product structure on the final covariance matrix. Following the implementation detailed in Flaxman et al. (2015), this greatly improves the computational efficiency of the model.
Given the above formulation for the dose-response function \(f\), we assume that we have access to noisy observations from it. These observations are typically generated from various cellular assays, e.g. viability assays. In particular we assume that given concentration points \(\boldsymbol{x}_1,\ldots,\boldsymbol{x}_n\) we have observations \(y_1,\ldots,y_n\) where \[ y_i = f(\boldsymbol{x}_i) + \epsilon_i, \] where we assume that the errors \(\epsilon_i\) are normally distributed with mean zero. For the variance of the observational errors, by default we model these in a heteroscedastic fashion as \[ \text{Var}\left[\epsilon_i\right] = \sigma^2(f(\boldsymbol{x}_i)+\lambda), \] where \(\lambda\) is set to a small value to handle the case when \(f = 0\), but there is still some residual noise. In a typical setup where cell viability is calculated through a normalization to positive and negative controls, lambda can be empirically set as \[ \lambda = \frac{\sigma^2_{+}}{\sigma^2_{-}}, \] where \(\sigma^2_{+}\) and \(\sigma^2_{-}\) denotes the variance of positive and negative controls, respectively.
We choose a heteroscedastic model by default, because in cell viability assays, the observations are normalized in relation to positive and negative controls. The positive controls typically have much lower variance compared to the negative controls, which translates to viability measures closer to zero being more precisely measured. We also allow homoscedastic noise as an option.
The full model specification, with all default prior distributions look like \[ y_i \sim \mathcal{N}\left(f(\boldsymbol{x}_i),\sigma^2(f(\boldsymbol{x}_i)+\lambda)\right), \ i = 1,\ldots, n \\ \sigma \sim \text{Inv-Ga}\left(5,1\right), \ \lambda = 0.005. \\ f(\boldsymbol{x}_i) = p_0(\boldsymbol{x}_i)+\Delta(\boldsymbol{x}_i) \mathbb{I}(10^{\boldsymbol{x}_i}>0) \\ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \\ l_j = \text{Beta}(1,1.25), \ s_i \sim \text{Gamma}(1,1), \\ m_i \sim \mathcal{N}(\theta_i,\sigma_{m_i}^2), \ j = 1,2 \\ \theta_i \sim \mathcal{N}(0,1), \ \sigma_{m_i}^2 \sim \text{Inv-Ga}\left(3,2\right), \ j = 1,2 \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} \\ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}, \\ \sigma_f^2 \sim \text{log-}\mathcal{N}(1,1), \ \ell \sim \text{Inv-Ga}(5,5) \\ b_j \sim \mathcal{N}(1,0.1^2), \ j = 1,2. \] Note that some of these specifications can be altered. For example, by default we estimate the lower asymptotes, but they can also be fixed equal to zero.
In the model specification above, the interaction term is multiplied with an indicator function \(\mathbb{I}(\boldsymbol{x}>0)\) taking the value 1 if and only if all elements in \(\boldsymbol{x}\) is strictly larger than zero. This makes sure that we don’t allow for interaction when one of the drugs is at zero concentration.
From the posterior dose-response function \(f | \mathbf{y}\), we derive a number of summary statistics concerning efficacy, synergy and antagonism.
For the monotherapy curves, we produce estimates of the drug sensitivity score (DSS) of each drug by the integral
\[ DSS_0 = \int_a^b 1-h_j(x) \text{d}x, \] where \(a=h_j^{-1}(1-\epsilon/2)\) and \(b=\max(x_{1j})\). That is, the integral is taken from where \(h_j\) crosses an ‘activation threshold’ \(1-\epsilon/2\), to the maximum value measured of the drug concentration in the screen. The activation treshold \(\epsilon\) is set to 5%. This value is further standardized by the total volume available for drug efficacy, \[ DSS = \frac{DSS_0}{(1-\epsilon/2)(b-a)} \] From here, values can be further standardized as in Yadav et al. (2014)
To summarise the total drug-response function, we utilise the measures developed in Cremaschi et al. (2019). As with the DSS scores, we calculate the area above the response function, and denote this the ‘residual volume under the surface’ or rVUS – as it’s the volume left over from integrating the surface.
In general, the integral looks like
\[ rVUS_0(f) = \int_a^b \int_c^d 1-f(\mathbf{x}) \ \text{d}\mathbf{x} \] where the integrals are taken over the observed drug range, i.e. \(a = \min (x_1)\), \(b = \max (x_1)\), \(c = \min (x_2)\), \(d = \max (x_2)\).
From here, we further standardise the rVUS scores to the total volume available for efficacy
\[ rVUS(f) = \frac{rVUS_0(f)}{(b-a)(d-c)}. \] The model calculates \(rVUS\) for the dose-response function \(f\), giving a measure of combined efficacy. In addition, we calculate \(rVUS(p_0)\), the non-interaction efficacy; \(rVUS(\Delta)\) the interaction efficacy. For the interaction term \(\Delta\), we also compute \(rVUS(\Delta^{-})\) and \(rVUS(\Delta^{+})\) for synergy and antagonism, where \(\Delta^{+}\) and \(\Delta^{-}\) denotes the positive and negative parts of \(\Delta\), respectively. That is,
\[ \Delta^{+}(\mathbf{x}) = \max(0,\Delta(\mathbf{x})) \\ \Delta^{-}(\mathbf{x}) = \max(0,-\Delta(\mathbf{x})). \]
When running screens with a large amount of drug combinations, it is helpful to have a normalised measure for comparing synergy across experiments. The \(rVUS\) scores defined above are already standardized to their drug concentration range, but to compare across experiments, we also standardize with respect to the uncertainty in the model. To do this, we calculate a synergy score by normalizing \(rVUS(\Delta^{-})\) with respect to its standard deviation. \[ \text{Synergy score} = \frac{\text{mean}(rVUS(\Delta^{-}))}{\text{sd}(rVUS(\Delta^{-}))}. \]
In the R package, we’ve attached two example datasets from a large drug combination screening experiment on diffuse large B-cell lymphoma. We’ll use these to show some simple use cases of the main functions and how to interpret the results.
Let’s load in the first example and have a look at it
library(bayesynergy)
data("mathews_DLBCL")
y = mathews_DLBCL[[1]][[1]]
x = mathews_DLBCL[[1]][[2]]
head(cbind(y,x))## Viability ibrutinib ispinesib
## [1,] 1.2295618 0.0000 0
## [2,] 1.0376006 0.1954 0
## [3,] 1.1813851 0.7812 0
## [4,] 0.5882688 3.1250 0
## [5,] 0.4666700 12.5000 0
## [6,] 0.2869514 50.0000 0
We see that the the measured viability scores are stored in the vector y, while x is a matrix with two columns giving the corresponding concentrations where the viability scores were read off.
Fitting the regression model is simple enough, and can be done on default settings simply by running the following code (where we add the names of the drugs involved, as well as the concentration units for plotting purposes).
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000173 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.73 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 2.32587 seconds (Warm-up)
## Chain 1: 2.09946 seconds (Sampling)
## Chain 1: 4.42533 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 6.2e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.62 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 2.1948 seconds (Warm-up)
## Chain 2: 2.18928 seconds (Sampling)
## Chain 2: 4.38408 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 9.8e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.98 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 2.29999 seconds (Warm-up)
## Chain 3: 2.03527 seconds (Sampling)
## Chain 3: 4.33527 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 6.3e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 2.3041 seconds (Warm-up)
## Chain 4: 3.97377 seconds (Sampling)
## Chain 4: 6.27787 seconds (Total)
## Chain 4:
The resulting model can be summarised by running
## mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
## la_1[1] 0.327 0.001971 0.0821 1.31e-01 3.39e-01 0.460 1735 1.00
## la_2[1] 0.377 0.007665 0.0813 7.07e-02 3.94e-01 0.461 112 1.04
## log10_ec50_1 0.490 0.004224 0.1746 2.27e-01 4.55e-01 0.922 1708 1.00
## log10_ec50_2 -0.997 0.020143 0.9876 -3.30e+00 -8.61e-01 0.470 2404 1.00
## slope_1 1.966 0.017386 0.9277 8.26e-01 1.74e+00 4.318 2847 1.00
## slope_2 1.406 0.038046 1.0922 7.64e-02 1.15e+00 4.249 824 1.00
## ell 3.066 0.032462 1.5579 1.26e+00 2.69e+00 7.126 2303 1.00
## sigma_f 0.839 0.015422 0.7880 1.67e-01 6.07e-01 2.925 2611 1.00
## s 0.105 0.000272 0.0152 8.04e-02 1.03e-01 0.139 3106 1.00
## dss_1 37.767 0.113290 5.6949 2.68e+01 3.77e+01 48.498 2527 1.00
## dss_2 43.230 0.605074 8.5854 2.16e+01 4.48e+01 55.759 201 1.02
## rVUS_f 82.783 0.014512 0.9425 8.08e+01 8.28e+01 84.564 4218 1.00
## rVUS_p0 73.074 0.036268 2.4060 6.82e+01 7.32e+01 77.522 4401 1.00
## rVUS_Delta -9.708 0.038297 2.5846 -1.48e+01 -9.59e+00 -4.984 4555 1.00
## rVUS_syn 9.765 0.037608 2.5338 5.12e+00 9.64e+00 14.872 4539 1.00
## rVUS_ant 0.057 0.002377 0.1326 5.01e-06 8.59e-05 0.462 3113 1.00
##
## log-Pseudo Marginal Likelihood (LPML) = 52.52761
which gives posterior summaries of the parameters of the model. In addition, the model calculates summary statistics of the monotherapy curves and the dose-response surface including drug sensitivity scores (DSS) for the two drugs in question, as well as the residual volumes under the surface (rVUS) that capture the notion of efficacy (rVUS_f), interaction (rVUS_Delta), synergy (rVUS_syn) and interaction (rVUS_ant).
As indicated, the total combined drug efficacy is around 80% (rVUS_f), of which around 70 percentage points can be attributed to \(p_0\) (rVUS_p0), leaving room for 10 percentage points worth of synergy (rVUS_syn). We can also note that the model is fairly certain of this effect, with a 95% credible interval given as (5.118, 14.872).
We can also create plots by simply running
which produces monotherapy curves, monotherapy summary statistics, 2D contour plots of the dose-response function \(f\), the non-interaction assumption \(p_0\) and the interaction \(\Delta\). The last plot displays the \(rVUS\) scores as discussed previously, with corresponding uncertainty.
The package can also generate 3D interactive plots by setting plot3D = T. These are displayed as following using the plotly library (Plotly Technologies Inc. (2015)).
The synergyscreen provides a work flow for data from big drug combination screens, where multiple drugs are tested in combination on multiple cell lines. It takes as input a list of experiments, each entry being a list containing the necessary elements needed for a call to the main regression function bayesynergy.
Included in the package is the result of a synergyscreen run of 583 drug combinations on the A-375 human melanoma cell line from ONeil et al. (2016). The synergyscreen object is a list with two entries, a dataframe with parameter estimates from each experiment, and a list entitled failed – containing experiments that either failed completely to process, or had an unsatisfactory fit.
## [1] 2
We see that the dataset has two experiments that failed to process, during an initial run of synergyscreen. There’s a multitude of reasons why an experiment might fail to process, it could be an input error, initialization problems or problems with the parallel processing.
The entries of failed are themselves lists, each containing the necessary information to process through the bayesynergy function
## [1] "y" "x" "drug_names" "experiment_ID"
## viability L778123 MK-4827
## [1,] 0.759 0.325 0.223
## [2,] 0.755 0.325 0.775
## [3,] 0.548 0.325 2.750
## [4,] 0.307 0.325 10.000
## [5,] 0.787 0.800 0.223
## [6,] 0.820 0.800 0.775
We can rerun experiments that failed to process, by simply passing the returned synergyscreen object back into the function. Note that we turn of the default options of saving each fit and plotting everything, and set method = "vb" indicating we use variational inference to fit the model.
fit_screen = synergyscreen(ONeil_A375, save_raw = F, save_plots = F, parallel = F,
bayesynergy_params = list(method = "vb"))## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000191 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.91 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 41.565 1.000 1.000
## Chain 1: 200 142.057 0.854 1.000
## Chain 1: 300 155.074 0.597 0.707
## Chain 1: 400 159.906 0.455 0.707
## Chain 1: 500 158.289 0.366 0.084
## Chain 1: 600 155.760 0.308 0.084
## Chain 1: 700 156.839 0.265 0.030
## Chain 1: 800 159.863 0.234 0.030
## Chain 1: 900 161.079 0.209 0.019
## Chain 1: 1000 156.419 0.191 0.030
## Chain 1: 1100 161.755 0.094 0.030
## Chain 1: 1200 160.802 0.024 0.019
## Chain 1: 1300 159.675 0.017 0.016
## Chain 1: 1400 156.812 0.015 0.016
## Chain 1: 1500 160.577 0.017 0.018
## Chain 1: 1600 163.765 0.017 0.019
## Chain 1: 1700 158.219 0.020 0.019
## Chain 1: 1800 163.282 0.021 0.023
## Chain 1: 1900 163.890 0.021 0.023
## Chain 1: 2000 162.114 0.019 0.019
## Chain 1: 2100 164.790 0.017 0.018
## Chain 1: 2200 163.331 0.017 0.018
## Chain 1: 2300 162.298 0.017 0.018
## Chain 1: 2400 164.494 0.017 0.016
## Chain 1: 2500 165.955 0.015 0.013
## Chain 1: 2600 163.239 0.015 0.013
## Chain 1: 2700 164.393 0.012 0.011
## Chain 1: 2800 164.131 0.009 0.009 MEAN ELBO CONVERGED MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000164 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.64 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Success! Found best value [eta = 1] earlier than expected.
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -24.356 1.000 1.000
## Chain 1: 200 138.219 1.088 1.176
## Chain 1: 300 140.679 0.731 1.000
## Chain 1: 400 144.068 0.554 1.000
## Chain 1: 500 140.497 0.449 0.025
## Chain 1: 600 145.995 0.380 0.038
## Chain 1: 700 146.182 0.326 0.025
## Chain 1: 800 150.830 0.289 0.031
## Chain 1: 900 149.633 0.258 0.025
## Chain 1: 1000 148.723 0.233 0.025
## Chain 1: 1100 148.771 0.133 0.024
## Chain 1: 1200 150.487 0.016 0.017
## Chain 1: 1300 152.562 0.016 0.014
## Chain 1: 1400 144.801 0.019 0.014
## Chain 1: 1500 149.798 0.020 0.014
## Chain 1: 1600 147.341 0.018 0.014
## Chain 1: 1700 151.137 0.020 0.017
## Chain 1: 1800 150.482 0.017 0.014
## Chain 1: 1900 153.894 0.019 0.017
## Chain 1: 2000 149.611 0.021 0.022
## Chain 1: 2100 151.782 0.022 0.022
## Chain 1: 2200 150.222 0.022 0.022
## Chain 1: 2300 147.617 0.023 0.022
## Chain 1: 2400 151.502 0.020 0.022
## Chain 1: 2500 150.379 0.017 0.018
## Chain 1: 2600 152.102 0.017 0.018
## Chain 1: 2700 152.839 0.015 0.014
## Chain 1: 2800 149.736 0.016 0.018
## Chain 1: 2900 151.451 0.015 0.014
## Chain 1: 3000 152.386 0.013 0.011
## Chain 1: 3100 152.474 0.012 0.011
## Chain 1: 3200 151.970 0.011 0.011
## Chain 1: 3300 151.874 0.009 0.007 MEAN ELBO CONVERGED MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
We can also plot the result of the screen:
Sometimes, the bayesynergy function may return with a warning. Ideally, we don’t want any warnings at all, and they should be examined closely, as posterior samples could be unreliable. Usually, the warning will tell the user how to fix the problem at hand, e.g. by running the chains for longer (set iter higher), or setting adapt_delta higher. See [https://mc-stan.org/misc/warnings.html] for some general tips.
Most commonly, the sampler might complain about divergent transitions. The warning will typically look like this:
## Warning: There were 2316 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
This is indicative of a posterior geometry that is tricky to explore. In the case where there is only a few divergent transitions, the usual trick is to set adapt_delta to a higher value, i.e. larger than 0.9 which is the default. This can be done through the control option in the bayesynergy call:
However, the case above, where there are 2316 divergent transitions, is indicative of a misspecified model. In my experience, this can happen for a few reasons.
lower_asymptotes = FALSE in the call. Unless one is specifically interested in these parameters, there are no reason to estimate them – the model fit will typically still be good without them.Cremaschi, Andrea, Arnoldo Frigessi, Kjetil Taskén, and Manuela Zucknick. 2019. “A Bayesian Approach to Study Synergistic Interaction Effects in in-Vitro Drug Combination Experiments.” http://arxiv.org/abs/1904.04901.
Flaxman, Seth, Andrew Gelman, Daniel Neill, Alex Smola, Aki Vehtari, and Andrew Gordon Wilson. 2015. “Fast Hierarchical Gaussian Processes.” Manuscript in Preparation.
ONeil, Jennifer, Yair Benita, Igor Feldman, Melissa Chenard, Brian Roberts, Yaping Liu, Jing Li, et al. 2016. “An Unbiased Oncology Compound Screen to Identify Novel Combination Strategies.” Molecular Cancer Therapeutics 15 (6): 1155–62. https://doi.org/10.1158/1535-7163.mct-15-0843.
Plotly Technologies Inc. 2015. “Collaborative Data Science.” Montreal, QC: Plotly Technologies Inc. 2015. https://plot.ly.
Stan Development Team. 2020. “RStan: The R Interface to Stan.” http://mc-stan.org/.
Yadav, Bhagwan, Tea Pemovska, Agnieszka Szwajda, Evgeny Kulesskiy, Mika Kontro, Riikka Karjalainen, Muntasir Mamun Majumder, et al. 2014. “Quantitative Scoring of Differential Drug Sensitivity for Individually Optimized Anticancer Therapies.” Scientific Reports 4 (1). https://doi.org/10.1038/srep05193.